Genome-wide CRISPR off-target prediction and optimization using RNA-DNA interaction fingerprints

The powerful CRISPR genome editing system is hindered by its off-target effects, and existing computational tools achieved limited performance in genome-wide off-target prediction due to the lack of deep understanding of the CRISPR molecular mechanism. In this study, we propose to incorporate molecular dynamics (MD) simulations in the computational analysis of CRISPR system, and present CRISOT, an integrated tool suite containing four related modules, i.e., CRISOT-FP, CRISOT-Score, CRISOT-Spec, CRISORT-Opti for RNA-DNA molecular interaction fingerprint generation, genome-wide CRISPR off-target prediction, sgRNA specificity evaluation and sgRNA optimization of Cas9 system respectively. Our comprehensive computational and experimental tests reveal that CRISOT outperforms existing tools with extensive in silico validations and proof-of-concept experimental validations. In addition, CRISOT shows potential in accurately predicting off-target effects of the base editors and prime editors, indicating that the derived RNA-DNA molecular interaction fingerprint captures the underlying mechanisms of RNA-DNA interaction among distinct CRISPR systems. Collectively, CRISOT provides an efficient and generalizable framework for genome-wide CRISPR off-target prediction, evaluation and sgRNA optimization for improved targeting specificity in CRISPR genome editing.

e For a given sgRNA, the genome-wide off-target sites are scored by CRISOT-Score, and those high-scored off-target sites are summarized to derive a specificity evaluation schema -CRISOT-Spec.CRISOT-Spec can be used to evaluate the targeting specificity and the quality of the sgRNA.f When the specificity of an sgRNA is not satisfactory, CRISOT-Opti introduces single nucleotide mutations to change the offtarget profile.By comparing different mutated sgRNAs, CRISOT-Opti can determine the optimized sgRNA as a substitution for the wild-type (WT) sgRNA.
We compared the performance of CRISOT-FP in CRISPR offtarget prediction with three state-of-the-art feature encoding schemes, i.e., Crista_feat, One-hot and Two-hot encoding.Crista_feat features are the features used by CRISTA 32 , which contain a wide range of features, including genomic content features, thermodynamics of the sgRNA, and the pairwise sgRNA-target similarity et al.One-hot and Two-hot encoding methods converts every bp of the sgRNA-target hybrid into a 16-bit and 8-bit vector (Fig. S2).We performed two leave-ones-out tests to evaluate CRISOT-FP and the existing encoding methods (Fig. S3).The leave-group-out (LGO) test randomly held out 1/5 of the inputs as testing data, so that the training and testing datasets contain all sgRNAs and different sequences of their corresponding on-/off-target sites.The leavegroup-out (LGO) test randomly held out 1/5 of the sgRNAs and the corresponding on-/off-target sequences as testing data, so that training and testing datasets contained different sgRNAs and offtarget sequences.In this case, LSO is taken as a stricter and more challenging prediction task compared to LGO.The LGO test estimated the prediction on unseen off-target sequences, whereas the LSO test ensured that the sgRNAs and off-targets were totally different between training and testing datasets.
We applied the XGB 33 classification algorithm for both tests (Fig. 2d, e and Fig. S4) for its great prediction performance and model interpretation ability (see Methods).For the LGO tests, both CRISOT-FP and the sequence-based encodings showed good performance in terms of area under curve (AUC) analysis.CRISOT-FP achieved slightly better receiver operating curve (ROC) performance than the previous features, with ROC-AUCs higher than 0.984 (Fig. 2d).The precisionrecall (PR) analysis indicated that CRISOT-FP achieved remarkably higher PR-AUCs than the other encoding methods, with PR-AUC scores of 0.698 and 0.690 for Change-seq and Site-seq datasets, respectively (Fig. 2e).For the LSO tests, CRISOT-FP still performed better than the previous features, and the differences in both ROC and PR were more remarkable than those in the LGO tests (Fig. 2d, e).Furthermore, we also tested CRISOT-FP and the other feature encoding methods by using different machine learning algorithms (Fig. S5).In this case, CRISOT-FP trained within the XGB algorithm outperformed the other features and the other machine learning algorithms.These results

Base pair features for encoding
Represented by: sgRNA vs. Target sequence  indicate the great potential of CRISOT-FP to identify real off-targets among the background sequences in a highly imbalanced scenario.Therefore, the XGB models trained on the Group I datasets in this section were used for further evaluations on the independent datasets.
Comparison of CRISOT-FP with state-of-the-art off-target prediction methods.We further evaluated the ability of CRISOT-FP to predict genome-wide off-target effects in independent testing datasets (the Group II datasets, Table 1, see Methods) with no overlaps with the training datasets, and compared with state-of-the-art off-target prediction methods.The score values of CRISOT-FP were the means of the leave-ones-out models trained on Group I datasets described in the first benchmark study.Four hypothesis-driven off-target scoring tools (KinPred 34 , CRISPRoff 5 , uCRISPR 6 and CFD 8 ) and four learning-based off-target prediction tools (MOFF 35 , CRISPRnet 10 , DLcrispr 11 and CNN_std 36 ) were used for comparison.The learning-based models were taken from their original studies that were trained on their own datasets.Two of the compared learning-based methods, i.e., DLcrispr and CRISPRnet models, were trained on datasets containing several sgRNAs that were included in the testing datasets.For fair comparisons, we presented both the results on the complete testing datasets and the results excluding the overlapped sgRNAs for these models (Fig. 3 and Fig. S6-S11).In the case that all testing sgRNAs were contained in the training datasets of the compared methods, we have indicated this case by using the bars in gray.In the case that the testing sgRNAs partially overlapped with the training datasets of the compared methods, we have presented additional bars in slash hatch.The overlapped sgRNAs are summarized in Supplementary Data 3. Since the off-target datasets were highly imbalanced, the PR-AUC results rather than ROC-AUC were more convincing.Three benchmarking scenarios were performed.
In the first evaluation scenario, we tested the performance of CRISOT-FP in predicting the in vitro Circle-seq datasets (Fig. 3a, Figs.S6, S7).Although the performances of DLcrispr and CRISPRnet (the gray bars in Fig. 3a and Fig. S7) were higher/comparable to those of CRISOT-FP, this is attributed to that the testing sgRNAs were already contained in their training datasets, thus the colors of their bars were changed to gray to indicate potential data leakage.The MOFF model achieved the best performance among the other compared methods.The PR-AUC results showed that the MOFF model achieve comparable performance on circleseq_u2os dataset, while CRISOT-FP surpassed it in the other three datasets.The fair comparison of CRISOT-FP with the MOFF and CNN_std models and the hypothesis-driven methods showed that CRISOT-FP outperformed them, with an average PR-AUC of 0.46 (p < 0.05, Fig. S7c).Therefore, the CRISOT-FP achieved better performance than the compared methods in predicting the in vitro offtarget datasets.
In the second evaluation scenario, we used the in cell datasets (Table 1, see Methods) for evaluation, which include the Guide-seq 37,38 , Surro-seq 39 and TTISS 40 datasets that are independent to the training datasets.The results (Fig. 3b, Figs.S8, S9) showed that CRISOT-FP outperformed all learning-based models and hypothesis-driven methods on the PR-AUC results (p < 0.05, Fig. S9c), even though data leakage may exist because the training datasets of the DLcrispr and CRISPRnet models contained sgRNAs that were in the testing Guideseq datasets (Supplementary Data 3).For a fair comparison, we removed the sgRNAs in the testing datasets that were contained in the training datasets of DLcrispr and CRISPRnet, and the same sgRNAs were removed in the testing datasets for CRISOT-FP.The results showed that excluding the overlapped sgRNAs slightly decreased the performances of DLcrispr and CRISPRnet while increased those of CRISOT-FP (bars with slash hatch in Fig. 3b and Fig. S9).Therefore, the CRISOT-FP achieved better performance than the compared methods in predicting the in cell off-target datasets.
In the third evaluation scenario, we set 6 fold-change (FC) cutoffs to the targeted dataset (the Surro-seq 39 dataset) to obtain 6 datasets with various sensitivities (see Methods).The results (Fig. 3c, Figs. S10,  S11) showed that the performances of all methods but CRISPRoff slightly increased as the FC cutoff increased.CRISOT-FP surpassed all learning-based models and hypothesis-driven methods on the PR-AUC results (p < 0.05, Fig. S11c).The results indicated that CRISOT-FP achieved better performance than all compared methods in predicting the targeted Surro-seq datasets with various sensitivities.
Taking together, the benchmark validation results indicated the high generalization capability of CRISOT-FP, and that CRISOT-FP outperformed the existing learning-based and hypothesis-driven methods in predicting the off-target effects of Cas9.
CRISOT-Score provides effective off-target scoring of Cas9 system Feature importance analysis and the designing of CRISOT-Score.We started our study by identifying the key features related to CRISPR off-target scoring.To this end, the XGB models were connected with SHapley Additive explanation (SHAP) algorithm 41 to identify the key interactions among all the features used in CRISOT-FP (see Methods).A higher SHAP importance value indicates a higher contribution of the feature to the predicted result.Some of the key features showed high SHAP importance values at most bp positions, indicating that these features and the related RNA-DNA molecular interactions were shared by different positions of the RNA-DNA hybrid in Cas9 (Fig. 4a,   Supplementary Data 2).The sum of all SHAP importance values at each position indicated that the PAM-proximal positions were more important than the PAM-distal positions (Fig. 4a, left panel), which is consistent with previous studies on the molecular mechanism of Cas9 42,43 .The results also indicated that positions 5, 8 and 9 were more important among the PAM-distal positions, which was consistent with the previous study using a free energy model 44 .
The interaction features were sorted by the sums at all positions (Fig. 4b).Among the top 5 features, E_surf (nonpolar solvation free energy), resA@N1 resB@N1 (the distance between N atoms on the bases of RNA and DNA nucleotides) and hbond_bp5 (hydrogen bonds) were related to the binding stability of the hybrid, indicating the importance of the binding stability to the activation of Cas9.These results were consistent with those in previous studies 14,43,45 .The curvature of the RNA-DNA hybrid is important to Cas9 and might affect the binding affinity of Cas9 with the RNA-DNA hybrid because the curvature of DNA was proven to influence the protein-DNA affinity 46 .We also summarized the contributions of different types of interaction features (Fig. 4c).The results showed that the hydrogen bonding feature contributed up to 4.3% (at position 12) to the model, even though the hydrogen bonding feature contained only one feature.The atom position and binding free energy features contributed averages of 54.1% and 23.5%, respectively, indicating the importance of the atom position features and the potential molecular mechanisms other than binding stability.In fact, we noticed that the positions of some atoms on the DNA nucleotide indicated by the distance and angle features (resB@C1' resB@N9, ang:resB@C1' resB@O5' resB@C3'_cos, ang:resB@O5' resB@N9 resB@C1'_cos, resB@P resB@C5' and resB@C5' resB@O5') were among the top 10 key features (Fig. 4b).The positions of these atoms might influence the interactions between the sgRNA-DNA hybrid and specific residues in Cas9, thus affecting the cleavage.
Finally, we used the key interaction features to design a flexible off-target scoring function, i.e., CRISOT-Score.For each position, we collected the top features (Supplementary Data 2) and their SHAP score values (Supplementary Data 2).These values were then aggregated to score specific bp types at specific positions.For example, to calculate the score of AA (sgRNA vs. off-target sequence) at position 5 using the top 5 key features, the mean SHAP score values related to the top 5 features at position 5 were collected, and their products were then aggregated as a final score (see Methods).We then determined how many features were needed to achieve considerable off-target prediction performance.Score maps were computed using the top 1 to the top 30 features at each position.The off-target scoring performances on different datasets were compared to those of the CRISOT-FP models (Fig. 4d and Fig. S13).As the feature number increased, the performance of CRISOT-Score increased sharply and achieved average ROC-AUC and PR-AUC scores of 99.2% and 76.0%, respectively, when the top 16 features were used.The ROC-AUC score remained at a high level, and the PR-AUC score slightly increased as the features increased.Therefore, we selected the top 24 features (Fig. S12) of each position to construct CRISOT-Score.The score map (Fig. S14) indicated that different positions favored different complementary bps.For example, positions 11 and 13 preferred AA over TT, positions 19 and 20 preferred TT over AA, and position 20 preferred CC over GG 47,48 .Different positions varied greatly in their sensitivity to mismatches.The PAMdistal positions, especially positions 1 and 2, were not sensitive to mismatches, whereas positions 13 to 17, which are within the seed region, were sensitive to mismatches.
Comprehensive evaluations of CRISOT-Score.The effectiveness of CRISOT-Score are evaluated comprehensively.In the first evaluation, CRISOT-Score was compared with 7 state-of-the-art hypothesis-driven off-target scoring methods including KinPred 34 , CRISPRoff 5 , uCRISPR 6 , MIT 7 , CFD 8 , CCTop 49 and CROPit 50 (learning-based methods were not included).We compared their performance on the Group II benchmark datasets (Table 1) with the measurements of ROC and PR (Fig. 4e, Fig. S15 and Fig. S16).CRISOT-Score significantly surpassed all seven existing off-target scoring methods in predicting off-target activities of all benchmark datasets (p < 0.05, Fig. S16).Of note, KinPred and CRISPRoff are mechanism-based scoring method that were derived from kinetic models 34 and the binding energy of the Cas9-sgRNA-DNA complex 5 .In comparison, CRISOT-Score included not only the binding free energy features, but also the atom position, hydrogen bonding and base pair/step geometric features that systematically reflect the interactions of the Cas9-sgRNA-DNA complex.Therefore, incorporating more details of the molecular interactions of Cas9 could benefit the prediction of off-target effects.
In the second evaluation, we tested how CRISOT-Score scores the active and inactive off-targets (Fig. 4f and Fig. S17).Among all potential off-target sites of the datasets, the experimentally validated off-target sites were defined as active off-targets, whereas the rests were inactive off-targets.The violin plots showed the CRISOT-Score distributions of the active vs. inactive off-targets (Fig. 4f).For the in vitro Change-seq, Site-seq and Circle-seq datasets, the CRISOT-Score of the active offtargets distributed generally distinct from the inactive off-targets.The peaks of the active off-targets were near the peaks of the inactive offtargets.In comparison, the CRISOT-Score of the active off-targets distributed greatly distinct from those of the inactive off-targets on the in cell Guide-seq and TTISS datasets.The peaks of the active off-targets of these datasets were approximately 0.7, and few active off-targets were scored less than 0.6.The results indicated that CRISOT-Score identified the active and inactive off-targets well, especially on the Guide-seq and TTISS datasets.In comparison, all of the other off-target scoring methods, but CRISPRoff, were not able to identify the active and inactive off-targets (Fig. S17).
In the third evaluation, we tested the proportions of active offtargets at different ranges of CRISOT-Score (Fig. 4g and Fig. S18).Generally, the frequency of active off-targets increased as the CRISOT-Score increased for all of the Group I and Group II datasets.The average frequencies summarized in Fig. 4g showed that the frequencies of active off-targets at CRISOT-Score ranges greater than 0.8 were higher than 0.9, and the frequencies of active off-targets at CRISOT-Score ranges lower than 0.5 were almost 0. It was indicated that a CRISOT-Score value that is higher than 0.8 almost indicates an active off-target, and a CRISOT-Score that is less than 0.5 almost indicates an inactive off-target.In comparison, CRISPRoff scores (Fig. S19) did not show a tendency that higher score sites were more likely to be active off-targets, and the frequency of active off-targets at the highest score range was 0, indicating its poorer scoring ability over CRISOT-Score.
CRISOT-Spec provides effective sgRNA targeting specificity evaluation of Cas9 system While CRISOT-Score evaluates the off-target effect of a specific offtarget site, a specificity score is commonly designed to evaluate the specificity of a given sgRNA in terms of its genome-wide off-target profile.There are several CRISPR specificity scoring tools, e.g., MOFFaggregrate 35 , CRISPRspec 5 , MIT 7 and CFD 8 , which aggregate the offtarget scores of the genome-wide off-target sites and calculate summary scores as the specificity scores.Thus, the existing specificity scores may relate to the number of potential off-target sites, i.e., a greater number of potential off-target sites usually results in a poorer specificity score.However, only a small fraction of the potential offtarget sites are active off-target sites.As shown in Fig. 5a, the fractions of active off-targets were almost zero when the CRISOT-Score values were less than 0.5, while were almost one when CRISOT-Score higher than 0.85.Take the higher and lower specificity sgRNAs in Change-seq datasets (Fig. 5b and Fig. S20) as an example, although their total numbers of potential off-target sites were similar, the lower specificity sgRNAs have a greater number of off-target sites that were scored higher than 0.5.That meant severe off-target effects could occur if a significant portion of the potential off-target sites scored higher than 0.5, indicating that during the evaluation of the genome-wide off-target profile of a given sgRNA, more attention should be given to those with a higher CRISOT-Score.
To this end, we developed a specificity scoring method, i.e., CRISOT-Spec.For a given sgRNA, CRISOT-Spec searched its genomewide off-target sites and calculated their CRISOT-Score values.The numbers of potential off-target sites at different ranges of CRISOT-Score values (Fig. 5b) were then calculated, and multiplied by the relative probability of being active off-targets (Fig. 5a).The product values were aggregated to derive the CRISOT-Spec values as a specificity score (see Methods).Because the probability of active off-targets with CRISOT-Score values lower than 0.5 was almost zero, CRISOT-Spec focused on off-target sites with CRISOT-Score values higher than 0.5, which were the most likely active offtargets.
To evaluate the performance of CRISOT-Spec, the CRISOT-Spec specificity score was compared to MOFF-aggregate, CRISPRspec, MIT and CFD specificity scores.The specificity scores were compared to the number of experimentally validated off-target sites in Groups I and II datasets.Off-target cleavage of a specific site would lead to great issues in clinical therapies, even when the cleavage efficiency is low.Thus, the number of validated off-target sites is important to evaluate the specificity of an sgRNA.CRISOT-Spec correlated well with the experimentally validated off-target sites (Fig. 5c) and achieved the best performance in predicting the off-target read fraction of the Site-seq dataset, with Spearman's correlation of −0.853.When compared with the other specificity score approaches, CRISOT-Spec significantly surpassed the CRISPRspec, MIT and CFD specificity scores (p < 0.05, Fig. 5d, e and Fig. S21), and was comparable to the only learning-based method, the MOFF-aggregate.
CRISOT-Opti optimizes a sgRNA to reduce off-target effects while maintaining its on-target effects Despite considerable on-target efficiency, an sgRNA may still have severe off-target effects (Fig. 5).In fact, the CRISPR system can tolerate a few mismatches, and the influence of a single mismatch on the targeting efficiency may be small 8,47 .In contrast, a single mutation in the sgRNA may significantly change the off-target profile and thus the specificity of a sgRNA.As the saying goes, "Better a diamond with a flaw than a pebble without".CRISOT-Opti was designed to find out such "a diamond with a flaw" -an sgRNA not perfectly match the target while present better targeting specificity.For a given sgRNA, CRISOT-Opti introduced 3 kinds of mutations to each of the 20 nucleotides (Fig. 5f).These mutated sgRNAs were paired with the target sequence to calculate the CRISOT-Score values.Although different types of mutations on the sgRNA generally decreased the CRISOT-Score value (Fig. S22), the results of CRISOT-Score indicated that a sequence with CRISOT-Score value of higher than 0.8 was always cleaved (i.e., active off-target) in most off-target detection experiments (Fig. S18), and a sequence with CRISOT-Score value of higher than 0.85 was cleaved in all of the off-target detection experiments.Therefore, to ensure considerable cleavage of the target DNA, we can set the of CRISOT-Score to 0.8, 0.85 or higher, so that CRISOT-Opti only considered mutated sgRNAs with CRISOT-Score values higher than the threshold.Since the sgRNAs were mutated, the off-target profiles were changed.CRISOT-Opti calculated the CRISOT-Spec scores of the mutated sgRNAs based on their new off-target profiles.The scores were compared to the score of the wild-type (WT) sgRNA.Because higher CRISOT-Spec values indicate higher targeting specificity of an sgRNA, those mutated sgRNAs with increased CRISOT-Spec could be considered better substitutions for the WT sgRNA.
Take the GAGTCCGAGCAGAAGAAGAAGGG sequence on the human EXM1 gene as an example of sgRNA optimization, which was also investigated by previous Guide-seq study 37,38 (Fig. 5g, h).The WT sgRNA could cleave several off-target sites (CRISOT-Spec = 0.364), especially the three sites with CRISOT-Score values higher than 0.75.In this case, CRISOT-Opti mutated each nucleotide of the 20-nt sgRNA.Among a total of 60 mutated sgRNAs, 38 sgRNAs with CRISOT-Score values higher than 0.85 were chosen for the specificity evaluation (Supplementary Data 4).The CRISOT-Spec scores indicated that the A11 > C mutation (A at position 11 of the sgRNA was mutated to C) was the best substitution, of which the CRISOT-Spec score increased to 0.720, while the CRISOT-Score is still high at 0.867.In comparison, if the sgRNA was improperly mutated (e.g., C6 > A in Fig. S23), the targeting specificity would greatly decrease.Similarly, when targeting another site (e.g., GGGAAA-GACCCAGCATCCGTGGG in Supplementary Data 4), we also found an optimized substitution using CRISOT-Opti.
To validate the optimization result, we performed Guide-seq screening to detect genome-wide off-targets of the WT and A11 > C sgRNAs (Supplementary Data 5).Although the on-target efficiency was reduced when the mutated sgRNA was used instead of the WT sgRNA, the mutated sgRNA still maintained sufficient on-target efficiency (Fig. 5h, lower panel), which is consistent with the previous study that A11 > C sgRNA would maintain about 38% targeting activities 38 .On the other hand, the results showed that the A11 > C sgRNA greatly reduced off-target editing (Fig. 5h, i).The number of off-target reads was reduced from 51967 to 1072 by approximately 50 folds.The off-target read fraction was significantly reduced from 0.653 to 0.162, and the active off-target sites from 57 to 11.Most of the off-target sites were either eliminated or greatly reduced.For example, the off-target site on gene HCN1, which detected 35129 off-target reads when the WT sgRNA was used, detected a 281-fold reduction when the mutated sgRNA was used; and the off-target site on gene SEMA5A, which detected 3064 off-target reads when the WT sgRNA was used, was totally eliminated when the mutated sgRNA was used.The HCN1 gene is involved in spontaneous rhythmic activity in both heart and brain 51 , and the SEMA5A gene contributes to axonal guidance during neural development 52 .The optimized sgRNA well protected these genes to be targeted, which will greatly reduce side effects when using the CRISPR-Cas9 system in clinical trials.
Experimental validation of sgRNA optimization for two important therapeutic genes PCSK9 and BCL11A using CRISOT In this study, we further performed additional proof-of-concept experimental validation to demonstrate the effectiveness of applying CRISOT for off-target evaluation and sgRNA optimization in gene therapy.To this end, two important therapeutic genes, PCSK9 and BCL11A, are tested here.CRISPR has been used to knockdown the PCSK9 gene in primates to lower the level of cholesterol 53 , and the editing of the BCL11A gene to treat transfusion-dependent β-thalassemia and sickle cell disease are in clinical trials 54 .The safety of CRISPR editing on these genes are of high concern.As a proof-of-concept example, we used CRISOT to evaluate the off-target profile of the gene editing and optimize two sgRNAs that have unsatisfying specificities targeting the PCSK9 and BCL11A genes.We performed whole genome screening (WGS) experiment, an unbiased and direct method for c The ratio of feature importance of various types of features on different positions.d PR-AUC Performances of CRISOT-Score developed using different numbers of key RNA-DNA molecular interaction features for each position.Results on the different benchmark datasets are compared to the results achieved by the CRISOT-FP models.e Comparison of CRISOT-Score with the state-of-the-art off-target scoring methods.The PR-AUC results indicate that CRISOT-Score surpasses the existing off-target scoring methods.f Violin plots of CRISOT-Score scores on active and inactive off-target sites of different datasets.The numbers of active and inactive offtarget sites can be found in Table 1.Among all potential off-target sites of the datasets, the experimentally validated ones were defined as active off-targets, and the rests were inactive off-targets.The three dashed lines in each violin show the quartiles of the data.g Frequencies of active and inactive off-target sites that are in different CRISOT-Score ranges.The presented result is averaged from Groups I and II benchmark datasets (Fig. S18, n = 11).Source data are provided as a Source Data file.
assessing off-target effects of Cas9, on the HEK293T cell line to experimentally validate the off-target profiles of sgRNAs.
We selected two optimized sgRNAs for PCSK9 (G10 > T mutation) and BCL11A (G11 > T) using CRISOT, with CRISOT-Score scores of about 0.8 (Fig. 6a and Fig. S24).We performed WGS experiments and found that the off-target effects of the optimized sgRNAs were greatly reduced, while the on-target frequencies were maintained in a considerable level (Fig. 6a, Fig. S25 and Supplementary Data 6).Notably, the on-target frequency (0.351) of the optimized sgRNA for PCSK9 was comparable to that of the WT sgRNA (on-target frequency = 0.365).Collectively, this study indicates that CRISOT can be used for optimized sgRNA design to improve the gene editing safety, while maintaining the on-target efficacy of CRISPR therapeutics.

Experimental Counts of Off-target Sites
CRISOT-Spec Score CRISOT facilitates genome-wide off-target prediction for base and prime editors Base (BEs) and prime editors (PEs) enabled precise editing of the target sequence without DNA double-strand breaks (DSBs), while they also involve the interaction of RNA-DNA hybrid.Although not quite proper, off-target prediction methods of Cas9 were usually used to evaluate the off-target effects of BEs and PEs including BE3, ABE7.10 and PE2 etc.
Here, we curated Group III (PE2 55 , ABE7.10 56,57 and BE3 57,58 ) off-target datasets (Table 1, see Methods), and evaluate the performance of CRISOT in predicting the off-target effects and targeting specificities of BEs and PEs 59 since CRISOT captures the common mechanisms of these various CRISPR systems by characterizing the interaction of RNA-DNA hybrid.The Group III datasets were independent to the training (Group I) datasets.Using these datasets, we found that CRISOT achieved better performance than the compared methods in predicting the off-target effects of BEs and PEs as expected (Fig. 6b and Fig. S26, S27).For the learning-based models, although they achieved comparable performance with CRISOT-FP on the ROC-AUC results, CRISOT-FP outperformed MOFF, CNN_std and CRISPRnet models on the PR-AUC results.The CRISOT-FP also surpassed DLcrispr after removing the overlapped sgRNAs to avoid data leakage of DLcrispr (p < 0.05,  Fig. 6b and Fig. S27).For the hypothesis-driven models, CRISOT-Score surpassed the existing hypothesis-driven off-target scoring methods in terms of PR-AUC and ROC-AUC (p < 0.05, Fig. 6b and Fig. S27).We also tested the performance of CRISOT-Spec in predicting the targeting specificities of BEs and PEs (Fig. 6c and Fig. S28).CRISOT-Spec achieved the best performance in predicting the targeting specificity of the ABE7.10 and BE3 datasets.Although the performance of MIT and MOFF-aggregate specificity scores were higher than CRISOT-Spec in the PE dataset, the MOFF-aggregate is a learning-based method and CRISOT-Spec greatly surpassed them in the ABE7.10 and BE3 datasets.Collectively, the application of CRISOT in BEs and PEs indicated that CRISOT achieved good generalizability in the prediction of the off-target effects and targeting specificities of BEs and PEs.

Discussions
The molecular mechanism of Cas9-mediated DNA cleavage is driven by the interactions of the Cas9-sgRNA-DNA complex 14,15 .In this study, we systematically analyzed the RNA-DNA molecular interaction features using MD simulations, and the CRISOT suite was developed based on the RNA-DNA molecular interaction fingerprints.The CRISOT suite is designed as a generalizable system for off-target prediction, targeting specificity evaluation and sgRNA optimization by the integration of artificial intelligence (AI) and MD technology.In summary, CRISOT can be taken as a one-stop platform to greatly improve in silico sgRNA design and reduce off-target effect for various CRISPR system by investigating the molecular interaction mechanism.Various applications were presented to demonstrate the utility of CRISOT.The CRISOT suite was used to optimize two sgRNAs targeting the PCSK9 and BCL11A genes.The WGS off-target detection experiments validated the optimization results and indicated that CRISOT could help users to find an optimized sgRNA with almost no sacrifice of on-target efficacy.In addition, the CRISOT suite can also be used to evaluate the off-target effects of BEs and PEs, indicating the generalizability of CRISOT among distinct CRISPR systems.
Despite the success of CRISOT, several improvements are expected in future work.First, this study only considered off-targets with NGG PAMs and without DNA/RNA bulges.Apart from the canonical NGG PAM, Cas9 has been reported to cleave off-target sequences with alternative PAM sequences, such as NAG and NGA 60 , although the cleavage rates were much smaller than those of the canonical PAM sequences.Future work should take into account various PAM sequences and DNA/RNA bulges.Second, our RNA-DNA molecular interaction features and CRISOT-FP could provide a deep understanding of the molecular mechanism of Cas9; however, the molecular mechanism of Cas9, especially the allosteric and cleavage mechanisms, contains molecular interactions that could not be reflected by the RNA-DNA hybrid.More comprehensive simulations and investigations on the Cas9-sgRNA-DNA complex are still required.Thirdly, features reflecting the target accessibility, e.g., chromatin-opening information and DNA methylation information, would improve the performance if they were properly augmented, however, such information are not guaranteed to be obtained.Therefore, we did not consider such information in this version to make CRISOT tools generally usable.Fourthly, we trained XGB models because they are simple and powerful, suitable for training tabular data, and provide great interpretations of the model.Various complex machine learning algorithms, especially deep learning algorithms can be incorporated in CRISOT in a flexible way in the future.Finally, the validations of our study are mainly performed in silico in the current stage and the experimental validation is served as a complementary.Additional large-scale experimental validations on sgRNAs targeting important therapeutic genes using Guide-seq, Change-seq, WGS etc. will make the results more convincing, while it is waiting to be performed in the future.

Preparation of RNA-DNA hybrids
We designed a of RNA-DNA hybrids containing 256 RNA-DNA hybrids (Fig. 2a) to calculate the RNA-DNA interaction features.Sixteen types of RNA-DNA base pairs, including matched and mismatched ones, were designed for feature calculation.Two additional base pairs were added up-and down-stream of the base pair (4 * 4 types, matched only), forming 4*16*4 = 256 types of 3-mer RNA-DNA hybrids for feature calculation to make the resulted features more representative.Finally, because the terminal base pairs were more likely to break during the simulations, we added 3-mer padding base pairs (rG-dC, the most stable base pair) up-and down-stream of the core base pairs to stabilize the simulation system.The original structure of the 9-bp RNA-DNA duplex was extracted from a high-resolution crystal structure of the Cas9-sgRNA-DNA ternary complex (PDB [5y36]) 13 .We used the web 3DNA platform 61 to perform the three steps of nucleic acid base mutations to construct initial RNA-DNA hybrid structures for MD simulations.First, the original sequence was mutated to construct a (r5'-GGG-GGG-GGG)(d3'-CCC-CCC-CCC) hybrid (GGG-CCC, which indicates the 3 bps in the middle).Then the fourth and sixth bps were mutated, and various Watson-Crick bps were maintained, to construct 16 DGD-RCR hybrids (D and R represent DNA and RNA nucleotides, respectively).Finally, the fifth bp was mutated, and various nucleic acid bases were generated to construct a total of 256 9-bp hybrids.Short MD simulations (10 ns) were performed to optimize the structures before and after the mutations.The constructed hybrid structures varied only in the 4th-6th bps, in which only the middle bps contained bp mismatches.

Molecular dynamics simulations
The all-atom RNA-DNA hybrids were solvated in TIP3P explicit water, and Na+ and Cl-ions were added to neutralize the systems.The DNA strands adopted the AMBER/parmbsc1 force field 62 , whereas the RNA strands adopted Shaw's force field 63 .The combination of the parmbsc1 force field and Shaw's force field demonstrated good quality in MD simulations of RNA-DNA hybrids 64 .All MD simulations were performed with the GROMACS 2016 package 65 .The systems were first prepared by energy minimization, followed by 1 ns NVT and 1 ns NPT simulations with the RNA-DNA hybrids fixed.The systems were then equilibrated by 20 ns NPT simulations.The production simulations were performed at 310 K for 100 ns.The simulation time step was set to 2 fs, and the simulation trajectories were collected every 10 ps.Electrostatic interactions were calculated using the PME algorithm, with a cutoff radius of 1.2 nm.The temperature coupling was calculated using a Berendsen thermostat, with a coupling time of 0.1 ps.Lincs constraints were applied to all bonds.The production simulations were performed for 3 structures randomly chosen from the equilibration simulations.Therefore, the final features for each type of base pair were calculated based on 16*3*100 = 4800 ns of MD simulations.The MD simulations were performed using Xeon E5-4640 CPU on the National Supercomputer Center in Guangzhou, China.The computational time for a single production run was about 20.5 hours, and the total computational time for all simulation systems was about 15,744 h.

MD trajectories analysis
Three replications of the MD trajectories of each RNA-DNA hybrids were analyzed to calculate the interaction features.An analysis of the base pair (e.g., shear, stretch, and stagger) and base step (e.g., twist, rise, and slide) parameters was performed using the do_x3dna package and dnaMD Python module 66 .The atom position features represented by distances, angles and dihedral angles of atoms were calculated using pytraj, a python package of CPPTRAJ 67 from AmberTools.P, C5', C3', O5', C1, N9 and N1 atoms and P, C5', C3', O5', C1, N1 and N3 atoms were selected for purines and pyrimidines, respectively (Fig. S29).The position parameters were calculated using the pytraj packages.
Hydrogen bonds were calculated using the GROMACS package 65 .The binding free energies and free energy decompositions of the RNA-DNA hybrids were calculated using Molecular mechanics/Generalized-Born Surface Area (MM-GBSA) method using the gmx_MMPBSA 40 program.The gmx_MMPBSA program is a GROMACS tool based on AMBER's binding free energy calculation engine to perform free energy calculations with GROMACS files.

Feature engineering
The feature values varied between different types of bps, but were similar when the nearest bps changed (Fig. S1).However, some feature values may vary greatly among the different nearest bps (e.g., CA's hydrogen bonds in Fig. S2a).Therefore, we compared the correlation among features derived from different nearest bps (Fig. S1, right panels).The boxplots of Pearson's correlation scores of all features indicated that the mean values correlated best to different types of nearest bps (Fig. 2b).The mean values were calculated to represent the 16 bp types.These features were then engineered.For angles, dihedral angles and other angle features in the base pair/step geometric features, the values might not properly represent their differences.To this end, we took the sin and cos values of the angles instead, and the other features were scaled to a range of −1 to 1. Variances of the features were calculated and the features with variances less than 0.01 were eliminated.The feature engineering resulted in a total of 193 features (Supplementary Data 1), including 1 hydrogen bonding feature, 42 binding free energy features, 114 atom position features and 36 base pair/step geometric features, which were used to encode sgRNA-DNA hybrids (Fig. 2c).

CRISPR off-target benchmark datasets
We constructed nine off-target benchmark datasets that contain data detected by different experiments (Table 1).The first two datasets (Group I) are in vitro datasets, namely the Change-seq and Site-seq datasets, which contain sgRNAs that are different from each other, are developed for training.The Change-seq dataset 68 contains 110 sgRNAs accounting for 21545 validated off-targets.The Site-seq dataset 14 contains 12 sgRNAs accounting for 2047 validated offtargets.
The second group of datasets (Group II) contained four datasets that are independent to the Group I datasets, namely the Circle-seq, Guide-seq, Surro-seq and TTISS datasets, are used as independent testing datasets to validate the models.The in vitro Circle-seq dataset 60 , is comprised of Circle-seq data of three cell types, i.e., the K562 dataset containing 6 sgRNAs and 751 off-targets, HEK293 dataset containing 4 sgRNAs and 857 off-targets, and U2OS dataset containing 5 sgRNAs and 2844 off-targets.The in cell Guide-seq dataset, is comprised of Guide-seq data from two independent studies, i.e., Tsai's dataset containing 10 sgRNAs and 341 off-targets 37 and Listgarten's dataset containing 23 sgRNAs and 53 off-targets 38 .The Surro-seq dataset 39 is a targeted in cell dataset that contains 105 sgRNAs and 714 validated off-targets.Because the cutoff of fold-change (FC) would change the sensitivity of the Surro-seq dataset, we set 6 FC cutoffs (2,  4, 8, 16, 32, and 64), and ignored those sites with p-values > 0.05, which resulted in various datasets with different sensitivity (validated/ potential ratios: 819/6540, 549/6741, 371/6763, 263/6764, 207/6764 and 174/6764, respectively).The in cell TTISS dataset 40 contains 59 sgRNA and 866 off-targets.
The last three datasets (Group III) contained in vitro off-target datasets of base and prime editors, which are independent from the Group I datasets, were used as testing datasets for further validations.The PE2 dataset 55 contains 9 sgRNAs accounting for 1493 validated offtargets.The BE3 dataset is comprised of Kim subset 58 containing 7 sgRNAs accounting for 76 validated off-targets and Liang subset containing 2 sgRNAs accounting for 3 off-targets.The ABE7.10 dataset is comprised of Kim subset 56 containing 7 sgRNAs accounting for 212 validated off-targets and Liang subset 57 containing 6 sgRNAs accounting for 44 off-targets.
The genome-wide off-target loci with a maximum of six nucleotide mismatches were calculated.The benchmark datasets represent different experimental genome-wide CRISPR off-target detection techniques.The experimentally validated off-target sites were defined according to the methods described in the corresponding articles.For example, Change-seq is a highly sensitive in vitro technique, thus, only those data with more than 100 Change-seq reads were considered as active off-targets, following the research of Change-seq 68 .The Cas-Offinder program 69 , which is a versatile tool that searches all potential off-target sites, was used to search for genome-wide off-target sequences with a maximum of 6 mismatches (with no bulges).Because CRISOT-FP did not encode the PAM sequence, we considered only offtarget sites with NGG PAMs.Models are trained on Group I datasets, and tested using Groups II and III datasets.All datasets are available in Zenodo repository 70 .The compared learning-based methods were trained on various datasets that were different from CRISOT-FP.The overlapped sgRNAs between the datasets in this study and the training datasets used by the other learning-based methods were summarized in Supplementary Data 3.

Building machine learning models for CRISPR off-target prediction based on CRISOT-FP
The XGB algorithm is based on tree boosting 33 , which is a fast and scalable learning algorithm and has shown great ability in feature interpretation.We used the XGB classifier to train leave-ones-out models, and predicted off-target effects of independent datasets using the average scores of the trained models.The XGB models used gbtree method as booster, and used logistic regression for binary classification.For benchmark purposes, we also trained supporting vector machine (SVM), logistic regression (LR) and random forest (RF) algorithms models using the scikit-learn package.The SVM classifier models used radial basis function (rbf) kernel, the LR classifier models used limited-memory BFGS (lbfgs) solver, and the RF classifier models used the Gini impurity as criteria.The detailed hyperparameter tuning and configurations are shown in Supplementary Data 7.

Feature importance analysis in CRISOT-FP
The feature importance was calculated by the tree SHAP algorithm 41 , which interprets feature importance scores from the tree-based XGB models.SHAP is a unified approach that can be used to explain the output of the XGB models 71  The training/testing datasets have been deposited in the Zenodo repository 70 .Source data are provided with this paper.

Fig. 1 |
Fig. 1 | Overview of the conceptual framework of CRISOT.a Left, structure of the Cas9-sgRNA-DNA complex.The sgRNA-DNA hybrid is colored red and the interactions are detailed in the box.Right, conceptual representation of the systematic MD simulations.b Illustration of the interaction features.From left to right: hydrogen bonding, binding free energy, atom position, base pair/step geometric features.c Middle panel, encoding of RNA-DNA molecular interaction features to CRISOT-FP for machine learning.Right panel, benchmarking the CRISOT-FP models on different benchmark datasets.Left panel, analysis of the key RNA-DNA molecular interaction features.d Key features are used to develop CRISOT-Score for off-target scoring.eFor a given sgRNA, the genome-wide off-target sites are scored by CRISOT-Score, and those high-scored off-target sites are summarized to derive a specificity evaluation schema -CRISOT-Spec.CRISOT-Spec can be used to evaluate the targeting specificity and the quality of the sgRNA.f When the specificity of an sgRNA is not satisfactory, CRISOT-Opti introduces single nucleotide mutations to change the offtarget profile.By comparing different mutated sgRNAs, CRISOT-Opti can determine the optimized sgRNA as a substitution for the wild-type (WT) sgRNA.

cFig. 2 |
Fig. 2 | Development and evaluation of CRISOT-FP.a Conceptual representation of the RNA-DNA hybrids for MD simulations and feature analysis.Among the 9 bps of the 256 hybrids, only the 3 bps in the middle were mutated differently, generating match or mismatch bps for the middle bp and keeping the other two bps complementary.The features of the middle bps are calculated.The mean values are calculated for the 16 bp types.b Box plots of the average Pearson's correlations of all features.They indicate the capability of a series of features to represent those features with different nearest bps.Center bar, median; box, inter-quartile range; whiskers, low and high quartiles; n = 246 features (see Methods).c Encoding of an sgRNA-DNA hybrid to form the RNA-DNA molecular interaction fingerprints (CRI-SOT-FP).Every bp is encoded by the corresponding RNA-DNA molecular interaction features, so a 20-bp sgRNA-DNA hybrid is encoded by a 3860-bit fingerprints.d, e Comparison of CRISOT-FP with Crista-feat, one-hot and two-hot encoding methods by LGO and LSO evaluations on the Group I (Change-seq and Site-seq) datasets.ROC-AUC (d) and PR-AUC (e) results of LGO and LSO are presented.Source data are provided as a Source Data file.
-seq) dataset with different cutoff of fold-change (FC) NOTE: Bars in gray color are those the training datasets contained all sgRNAs in the testing datasets.Bars with slash hatch are presented by excluding the overlaps.

Fig. 3 |
Fig. 3 | Comparison of CRISOT-FP with the state-of-the-art off-target prediction methods in predicting independent (the Group II) datasets.PR-AUC results for (a) in vitro datasets, (b) in cell datasets and (c) targeted (Surro-seq) dataset with different cutoff of fold-change (FC) are shown, and the ROC-AUC results can be found in Fig. S6-11.The blue and green bars represent learning-based and hypothesis-driven methods, respectively.Bars in gray color are those the training datasets contained all sgRNAs in the testing datasets.Bars with slash hatch are presented by excluding the overlapped sgRNAs.Of note, for fair comparisons, the same sgRNAs were removed from the testing datasets for CRISOT-FP.The overlapped sgRNAs are summarized in Supplementary Data 3.The CRISOT-FP models are trained on the Group I datasets, which are independent to the Group II datasets.Source data are provided as a Source Data file.

Fig. 4 |
Fig. 4 | Feature importance analysis and the evaluation of CRISOT-Score.a Overall SHAP feature importance of the RNA-DNA molecular interaction features.The position importance values on the left are the sums of the rows on the right.b Top 30 important features.The importance values are the sums of the lines in (a).c The ratio of feature importance of various types of features on different positions.d PR-AUC Performances of CRISOT-Score developed using different numbers of key RNA-DNA molecular interaction features for each position.Results on the different benchmark datasets are compared to the results achieved by the CRISOT-FP models.e Comparison of CRISOT-Score with the state-of-the-art off-target scoring methods.The PR-AUC results indicate that CRISOT-Score surpasses the existing Specificity scores vs. number of off-target sites https://doi.org/10.1038/s41467-023-42695-4Nature Communications | (2023) 14:7521

Fig. 5 |
Fig.5| CRISOT-Spec for scoring the targeting specificity and CRISOT-Opti for sgRNA optimization.a Probabilities of being active off-target sites.The probabilities of different CRISOT-Score ranges are shown as the average fractions of active off-targets in the benchmark datasets.b Numbers of potential off-target sites in different CRISOT-Score ranges.We sort the 110 sgRNAs of Change-seq dataset according to the numbers of potential off-target sites and divide them into 11 groups to minimize the difference of numbers of potential off-target sites in each group.Top and bottom three sgRNAs with the least and most experimentally validated off-target sites are selected as the higher and lower specificity sgRNAs, respectively, for each group.The presented result is a summary of the 2-nd to the 10-th groups.c Spearman's correlations between the CRISOT-Spec scores and the experimentally validated numbers of off-target sites reported by the different experiments.The error bands represent the confidence intervals of 95% for the regression estimates.d, e Comparison of CRISOT-Spec with existing sgRNA specificity evaluation tools for predicting the number of off-target sites.Absolute Spearman's correlation scores are shown.The gray bars are results of MOFF-aggregate, a learning-based method.e The results of one-sided paired t-test (n.s.:Not significant, p > 0.05; **p < 0.01; n = 5).The p-values are 0.007, 0.002, 0.001 and 0.757, respectively.f Conceptual framework of CRISOT-Opti optimization.g Comparison of off-target profiles of the WT and optimized sgRNAs.The circos plots show the off-target sites with CRISOT-Score values > 0.6.The scatter points indicate off-targets with CRISOT-Score values > 0.6, and the lines indicate those CRISOT-Score values > 0.65.The blue and green colors indicate the off-target sites of WT and optimized sgRNAs, respectively.The optimized sgRNA is optimized by setting a CRISOT-Score threshold to 0.85.h Summary of the CRISOT prediction results (higher panel) and the Guide-seq experimental results (lower panel).i, Comparison of experimental off-target profiles of the WT and optimized sgRNAs.The blue and green lines of the circos plot indicate the off-target sites of WT and optimized sgRNAs, respectively, which are detected by Guide-seq.Line length indicates the log10-transformed off-target reads.Source data are provided as a Source Data file.

Fig. 6 |
Fig. 6 | Applications of CRISOT for therapeutic gene editing optimization and off-target prediction in BE and Prime editors.a The CRISOT prediction results and WGS experimental results of wild-type and optimized sgRNAs targeting the PCSK9 and BCL11A genes.b Comparison of CRISOT-FP and CRISOT-Score with the existing off-target prediction methods for predicting the off-target effects of base and prime editors using PR-AUC.Learning-based methods are colored in blue, hypothesis-driven methods are colored in green.Bars in gray color are those the training datasets contained all sgRNAs in the testing datasets.Bars with slash hatch are presented by excluding the overlapped sgRNAs.Of note, for fair comparisons, the same sgRNAs were removed from the testing datasets for CRISOT-FP.The overlapped sgRNAs are summarized in Supplementary Data 3.The CRISOT-FP models are trained on the Group I datasets, which are independent to the Group III datasets.c Comparison of CRISOT-Spec with existing sgRNA specificity evaluation tools for predicting the number of off-target sites detected in base and prime editing.The gray bars are results of MOFF-aggregate, a learning-based method.Source data are provided as a Source Data file.

F RD, j i 2 ,F
. SHAP feature importance values were calculated as the mean absolute values of the SHAP scores for each XGB model.The presented SHAP feature importance values were the combination of the SHAP importance values of the models trained on the three datasets.Analysis were performed on models trained on the Change-seq and Site-seq datasets.The SHAP importance values are shown as features vs. positions (Supplementary Data 2).The SHAP importance values of different positions were the sums of all the importance values at different positions.SHAP importance values of different features were the sums of importance values at all 20 positions.Design of CRISOT-ScoreWe selected the top N features of each position to calculate the CRISOT-Score values.For a bp RD (R denotes the representation of sgRNA sequence (U -> T) and D denotes the DNA sequence) at position i, its score was computed asS RD,i = X N . . .,V F RD, j i nð1Þwhere F RD, j is the top j-th feature; V F RD, j i n is the n-th SHAP score values for F RD, j at position i.The meanð½V importance values, overlapped sgRNAs, sgRNA optimization, off-target data, et al. are available as Supplementary Data 1-7.

Table 1 |
Details of the benchmark datasets